--- title: Core keywords: fastai sidebar: home_sidebar summary: "Core functionality for the fastai audio library." description: "Core functionality for the fastai audio library." ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %}

Audio Signals

AudioGetter

This section regroups the basic types used in vision with the transform that create objects of those types.

{% raw %}
{% endraw %} {% raw %}
{% endraw %} {% raw %}

get_audio_files[source]

get_audio_files(path, recurse=True, folders=None)

Get audio files in path recursively, only in folders, if specified.

{% endraw %} {% raw %}
{% endraw %} {% raw %}

AudioGetter[source]

AudioGetter(suf='', recurse=True, folders=None)

Create get_audio_files partial function that searches path suffix suf and passes along kwargs, only in folders, if specified.

{% endraw %} {% raw %}
{% endraw %} {% raw %}
{% endraw %} {% raw %}

tar_extract_at_filename[source]

tar_extract_at_filename(fname, dest)

Extract fname to dest/fname.name folder using tarfile

{% endraw %} {% raw %}
p = untar_data(URLs.SPEAKERS10, extract_func=tar_extract_at_filename)
{% endraw %} {% raw %}
audio_getter = AudioGetter("", recurse=True, folders=None)
{% endraw %} {% raw %}
files = audio_getter(p)#files will load differently on different machines so we specify examples by name
ex_files = [p/f for f in ['m0005_us_m0005_00218.wav', 
                                'f0003_us_f0003_00279.wav', 
                                'f0001_us_f0001_00168.wav', 
                                'f0005_us_f0005_00286.wav',]]
{% endraw %}

AudioItem

{% raw %}
{% endraw %} {% raw %}

class AudioTensor[source]

AudioTensor(x, sr, **kwargs) :: TensorBase

{% endraw %}

Patch on indexing to retain the AudioTensor type, so when indexing it stays the same type

{% raw %}
{% endraw %} {% raw %}
AudioTensor(torch.ones(10), sr=100)
AudioTensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
{% endraw %} {% raw %}
{% endraw %} {% raw %}

show_audio_signal[source]

show_audio_signal(ai, ctx, **kwargs)

{% endraw %} {% raw %}
item0 = AudioTensor.create(ex_files[0])
{% endraw %} {% raw %}
item0.shape
torch.Size([1, 58240])
{% endraw %} {% raw %}
item0.sr, item0.nchannels, item0.nsamples, item0.duration
(16000, 1, 58240, 3.64)
{% endraw %} {% raw %}
test_eq(type(item0.data), torch.Tensor)
test_eq(item0.sr, 16000)
test_eq(item0.nchannels, 1)
test_eq(item0.nsamples, 58240)
test_eq(item0.duration, 3.64)
{% endraw %} {% raw %}
item0[0]
AudioTensor([ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -9.1553e-05,
        -6.1035e-05,  0.0000e+00])
{% endraw %} {% raw %}
item0.show()
{% endraw %} {% raw %}
item1 = AudioTensor.create(files[1]);
{% endraw %} {% raw %}
item0.show()
item1.show()
{% endraw %} {% raw %}
#get 3 equal length portions of 3 different signals so we can stack them
#for a fake multichannel example
ai0, ai1, ai2 = map(AudioTensor.create, ex_files[1:4]);
min_samples = min(ai0.nsamples, ai1.nsamples, ai2.nsamples)
s0, s1, s2 = map(lambda x: x[:,:min_samples], (ai0, ai1, ai2))
{% endraw %} {% raw %}
tst0 = AudioTensor(torch.ones(10), sr=120)
tst1 = AudioTensor(torch.ones(10), sr=150)
(tst0 + tst1).sr
120
{% endraw %} {% raw %}
test_eq(s0.shape, s1.shape)
test_eq(s1.shape, s2.shape)
{% endraw %} {% raw %}
fake_multichannel = AudioTensor(torch.stack((s0, s1, s2), dim=1).squeeze(0), sr=16000)
{% endraw %} {% raw %}
test_eq(fake_multichannel.nchannels, 3)
test_eq(fake_multichannel.nsamples, 53760)
{% endraw %} {% raw %}
fake_multichannel.show()
{% endraw %} {% raw %}
{% endraw %} {% raw %}

class OpenAudio[source]

OpenAudio(items) :: Transform

Delegates (__call__,decode,setup) to (encodes,decodes,setups) if split_idx matches

{% endraw %}

repr of Transform is:
classname: self.use_as_item {self.encodes} {self.decodes}
encodes and decodes are TypeDispatches whose reprs are str of dict where k/v pair is typename and function that handles that type

{% raw %}
oa = OpenAudio(files); oa
OpenAudio: (object,object) -> encodes (object,object) -> decodes
{% endraw %} {% raw %}
#demonstrate functionality of OpenAudio.encodes, the rest of the nb will
#use files that are opened by name for reproducibility/testing
oa = OpenAudio(files)
item100 = oa.encodes(100)
item100.show()
{% endraw %} {% raw %}
#test open audio on a random set of files
for i in range(10):
    idx = random.randint(0, len(files)-1)
    test_eq_type(oa.encodes(idx), AudioTensor.create(files[idx]))
    test_eq_type(oa.decodes(idx), files[idx])
{% endraw %} {% raw %}
type(oa)
__main__.OpenAudio
{% endraw %} {% raw %}
oa.encodes(0)
AudioTensor([[ 0.0000,  0.0000,  0.0000,  ..., -0.0002, -0.0004, -0.0007]])
{% endraw %} {% raw %}
oa.decodes(0)
Path('/home/condor/.fastai/data/ST-AEDS-20180100_1-OS/m0001_us_m0001_00077.wav')
{% endraw %} {% raw %}
oa.items[0]
Path('/home/condor/.fastai/data/ST-AEDS-20180100_1-OS/m0001_us_m0001_00077.wav')
{% endraw %}

Create functions to wrap TorchAudio

{% raw %}
{% endraw %}

Add AudioBlock

{% raw %}
{% endraw %} {% raw %}

AudioBlock[source]

AudioBlock()

{% endraw %}

Audio Spectrograms

Note:
Overriding getattr to store the settings isnt ideal, but if we dump them all in as attributes by doing `x.__dict__.update(settings)` we then can't easily pass settings when we do a transform and create a new AudioSpectrogram objct. Potential fixes are
1. Having both a settings dict and updating the dict with all its attributes (this feels dirty)
2. Finding a way to implement deepcopy for AudioSpectrogram so that we can clone it efficiently
3. Dumping the spectrogram settings and having a method that collects them so it can be passed to the constructor when we make a new AudioSpectrogram object in a transform
**Update: #2 is now a reasonable option because we mutate in place and dont need to pass settings forward**

AudioSpectrogram Class

{% raw %}
{% endraw %} {% raw %}

class AudioSpectrogram[source]

AudioSpectrogram(x, **kwargs) :: TensorImageBase

{% endraw %}
TO-DO:
1. Get colorbar and axes working for multiplot display
2. Have someone who knows matplotlib better cleanup/refactor
3. Plotting the spectrogram forces it to a uniform size, we may want to display either the shape of the image, or display it to scale with something like plt.figure(figsize=(sg.width/30, sg.height/30))
{% raw %}
{% endraw %} {% raw %}

show_spectrogram[source]

show_spectrogram(sg, ax, ctx, figsize, **kwargs)

{% endraw %}

Spectrogram Generation: AudioToSpec

{% raw %}
{% endraw %} {% raw %}

class AudioToSpec[source]

AudioToSpec(pipe, settings) :: Transform

Delegates (__call__,decode,setup) to (encodes,decodes,setups) if split_idx matches

{% endraw %} {% raw %}
{% endraw %} {% raw %}

SpectrogramTransformer[source]

SpectrogramTransformer(mel=True, to_db=True)

{% endraw %} {% raw %}

fill_pipeline[source]

fill_pipeline(transform_list, sg_type, **kwargs)

Adds correct args to each transform

{% endraw %} {% raw %}

warn_unused[source]

warn_unused(all_kwargs, used_kwargs)

{% endraw %}
Note:
The function_list += f(**usable_kwargs) only works if all args are keyword arguments, doesnt work for unnamed args. Could add in a get usable args that checks if default is inspect._empty. This also needs more tests
Note:
If a function (e.g. specshow) accepts kwargs, this wont pass extra arguments because specshow doesnt accept all kwargs, and will break if you pass in unexpected ones, but we have no way of knowing what functions they delegate to and pulling out the relevant kwargs, so if there is something we know it accepts as a kwarg like "cmap" we need to pass it in manually

get_usable_kwargs takes a function and a dictionary of kwargs that may or may not be relevant to that function and returns a dictionary of all the default values to that function, updated with the kwargs that can be successfully applied. This is done because, first it allows us to combine multiple functions into a single AudioToSpec Transform but only pass the appropriate kwargs, secondly because it allows us to keep a dictionary of the settings used to create the Spectrogram which is sometimes used in it's display and cropping, and third because it allows us to warn the user when they are passing in improper or unused kwargs.

{% raw %}
{% endraw %} {% raw %}

get_usable_kwargs[source]

get_usable_kwargs(func, kwargs, exclude=None)

{% endraw %}

Example: testing with a function that only takes a and b as kwargs

{% raw %}
def test_kwargs(a:int=10, b:int=20): pass

kwargs = {'a':1, 'b':2}
extra_kwargs = {'z':0, 'a':1, 'b':2, 'c':3}
test_eq(get_usable_kwargs(test_kwargs,       kwargs    ), kwargs)
test_eq(get_usable_kwargs(test_kwargs, extra_kwargs, []), kwargs)
{% endraw %} {% raw %}
item0 = AudioTensor.create(ex_files[0])
{% endraw %} {% raw %}
DBMelSpec = SpectrogramTransformer(mel=True, to_db=True)
{% endraw %} {% raw %}
a2s = DBMelSpec(n_fft=2048, hop_length=128, n_mels=64, baloney="hi")
/home/condor/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:45: UserWarning: baloney is not a valid arg name and was not used
{% endraw %} {% raw %}
type(item0)
__main__.AudioTensor
{% endraw %} {% raw %}
item0.__dict__
{'_meta': {'sr': 16000}}
{% endraw %}

<<<<<<< HEAD

{% raw %}
a2s
AudioToSpec: (AudioTensor,object) -> encodes 
{% endraw %} {% raw %}
sg = a2s(item0)
{% endraw %} {% raw %}
item0.__dict__
{'_meta': {'sr': 16000}}
{% endraw %} {% raw %}
type(item0)
__main__.AudioTensor
{% endraw %} {% raw %}
item0.shape
torch.Size([1, 58240])
{% endraw %} {% raw %}
sg.shape
torch.Size([1, 64, 456])
{% endraw %} {% raw %}
sg.settings
{'mel': True,
 'to_db': True,
 'sample_rate': 16000,
 'n_fft': 2048,
 'win_length': 2048,
 'hop_length': 128,
 'f_min': 0.0,
 'f_max': None,
 'pad': 0,
 'n_mels': 64,
 'window_fn': <function _VariableFunctions.hann_window>,
 'wkwargs': None,
 'stype': 'power',
 'top_db': None,
 'sr': 16000,
 'nchannels': 1}
{% endraw %} {% raw %}
sg.show()
{% endraw %}

=======

{% raw %}
sg.show()
{% endraw %}

>>>>>>> master

Display and Testing

{% raw %}
# get a sg with weird settings for testing
item0 = AudioTensor.create(ex_files[0])
item1 = AudioTensor.create(ex_files[1])
a2s = DBMelSpec(f_max = 20000, n_mels=137)
sg = a2s(item0)
sg1 = a2s(item1)
{% endraw %} {% raw %}
sg.show()
sg1.show()
{% endraw %} {% raw %}
sg_mc = a2s(fake_multichannel)
{% endraw %} {% raw %}
sg_mc.show()
{% endraw %} {% raw %}
sg.shape
torch.Size([1, 137, 114])
{% endraw %} {% raw %}
sg._settings
{'mel': True,
 'to_db': True,
 'sample_rate': 16000,
 'n_fft': 1024,
 'win_length': 1024,
 'hop_length': 512,
 'f_min': 0.0,
 'f_max': 20000,
 'pad': 0,
 'n_mels': 137,
 'window_fn': <function _VariableFunctions.hann_window>,
 'wkwargs': None,
 'stype': 'power',
 'top_db': None,
 'sr': 16000,
 'nchannels': 1}
{% endraw %} {% raw %}
sg.nchannels, sg.height, sg.width
(1, 137, 114)
{% endraw %} {% raw %}
#test the explicit settings were properly stored in the spectrogram object and can be accessed as attributes
test_eq(sg.f_max, 20000)
test_eq(sg.hop_length, 512)
test_eq(sg.sr, item100.sr)
test_eq(sg.mel, True)
test_eq(sg.to_db, True)
test_eq(sg.nchannels, 1)
test_eq(sg.height, 137)
test_eq(sg.n_mels, sg.height)
test_eq(sg.width, 114)
{% endraw %} {% raw %}
defaults = {k:v.default for k, v in inspect.signature(_GenMelSpec).parameters.items()}
a2s = DBMelSpec(f_max =20000, hop_length=345)
sg = a2s(item100)
test_eq(sg.n_mels, defaults["n_mels"])
test_eq(sg.n_fft , 1024)
test_eq(sg.shape[1], sg.n_mels)
test_eq(sg.hop_length, 345)
{% endraw %} {% raw %}
# test the spectrogram and audio have same duration, both are computed
# on the fly as transforms can change their duration
test_close(sg.duration, item100.duration, eps=0.1)
{% endraw %}

Test if spectrograms are right-side up

{% raw %}
a2s_5hz = DBMelSpec(
    sample_rate=16000,
    n_fft=1024,
    win_length=1024,
    hop_length=512,
    f_min=0.0,
    f_max=20000,
    pad=0,
    n_mels=137,
)
{% endraw %} {% raw %}
sine_5hz = torch.Tensor([0.5 * np.cos(2 * np.pi * 5 * np.arange(0, 1.0, 1.0/16000))])
{% endraw %} {% raw %}
at_5hz = AudioTensor(sine_5hz, 16000)
{% endraw %} {% raw %}
sg_5hz = a2s_5hz(at_5hz)
{% endraw %} {% raw %}
sg_5hz.show()
{% endraw %} {% raw %}
# testing to make sure the lowest bin of the spectrogram has the highest value/most energy
max_row = sg_5hz.max(dim=1).indices.mode().values.item()
assert max_row < 2
{% endraw %}

Test warnings for missing/extra arguments

{% raw %}
SHOW_W=True
{% endraw %} {% raw %}
#test warning for unused argument 'power' for melspec
#tests AudioToSpec and its from_cfg class method
voice_mel_cfg = {'mel':True, 'to_db':True, 'n_fft':2560, 'f_max':22050., 'n_mels':128, 'hop_length':256, 'power':2}
test_warns(lambda: AudioToSpec.from_cfg(voice_mel_cfg), show=SHOW_W)
<class 'UserWarning'>: power is not a valid arg name and was not used
{% endraw %} {% raw %}
test_warns(lambda: DBMelSpec(power=2, n_fft=2560, f_max=22050, n_mels=128), show=SHOW_W)
<class 'UserWarning'>: power is not a valid arg name and was not used
{% endraw %} {% raw %}
#test for unused arguments 'f_max' and 'n_mels' for non-mel Spectrogram
voice_mel_cfg = {'mel':False, 'to_db':True, 'f_max':22050., 'n_mels':128, 'n_fft':2560, 'hop_length':256, 'power':2}
test_warns(lambda: AudioToSpec.from_cfg(voice_mel_cfg), show=SHOW_W)
<class 'UserWarning'>: n_mels is not a valid arg name and was not used
<class 'UserWarning'>: f_max is not a valid arg name and was not used
{% endraw %} {% raw %}
#test warning for unused argument 'top_db' when db conversion not done
voice_mel_cfg = {'mel':True, 'to_db':False, 'top_db':20, 'n_fft':2560, 'f_max':22050., 'n_mels':128, 'hop_length':256}
test_warns(lambda: AudioToSpec.from_cfg(voice_mel_cfg), show=SHOW_W)
<class 'UserWarning'>: top_db is not a valid arg name and was not used
{% endraw %} {% raw %}
#test warning for invalid argument 'doesntexist'
voice_mel_cfg = {'mel':True, 'to_db':True,'doesntexist':True, 'n_fft':2560, 'f_max':22050., 'n_mels':128, 'hop_length':256}
test_warns(lambda: AudioToSpec.from_cfg(voice_mel_cfg), show=SHOW_W)
<class 'UserWarning'>: doesntexist is not a valid arg name and was not used
{% endraw %}

AudioToSpec Timing Tests

{% raw %}
a_to_db_mel = SpectrogramTransformer()()
a_to_nondb_mel = SpectrogramTransformer(to_db=False)()
a_to_db_nonmel = SpectrogramTransformer(mel=False)()
a_to_nondb_non_mel = SpectrogramTransformer(mel=False, to_db=False)()
a_to_db_mel_hyperparams = SpectrogramTransformer()(n_fft=8192, hop_length=128)
{% endraw %} {% raw %}
%%timeit -n10
a_to_db_mel(item0)
2.78 ms ± 62.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
{% endraw %} {% raw %}
%%timeit -n10
a_to_nondb_mel(item0)
2.37 ms ± 86.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
{% endraw %} {% raw %}
%%timeit -n10
a_to_nondb_mel(item0)
2.17 ms ± 204 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
{% endraw %} {% raw %}
%%timeit -n10
a_to_db_nonmel(item0)
2.09 ms ± 350 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
{% endraw %} {% raw %}
%%timeit -n10
a_to_nondb_non_mel(item0)
1.73 ms ± 235 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
{% endraw %} {% raw %}
%%timeit -n10
# Time can blow up as a factor of n_fft and hop_length. n_fft is best kept to a power of two, hop_length
# doesn't matter except smaller = more time because we have more chunks to perform STFTs on
a_to_db_mel_hyperparams(item0)
55.6 ms ± 3.76 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
{% endraw %}

AudioToSpec Timing Tests as audio length scales

{% raw %}
import time
def time_variable_length_audios(f, max_seconds=30, sr=16000, channels=1):
    times = []
    audios = [AudioTensor(torch.randn(channels, sr*i), sr) for i in range(1,max_seconds+1,2)]
    for a in audios:
        start = time.time()
        out = f(a)
        end = time.time()
        times.append(round(1000*(end-start), 2))
    return times
{% endraw %} {% raw %}
%%time
a2s = SpectrogramTransformer()()
max_seconds = 180
times_mono = time_variable_length_audios(f=a2s, max_seconds=max_seconds)
times_stereo = time_variable_length_audios(f=a2s, max_seconds=max_seconds, channels=2)
plt.plot(np.arange(0,max_seconds,2), times_mono, label="mono")
plt.plot(np.arange(0,max_seconds,2), times_stereo, label="stereo")
plt.legend(['mono','stereo'])
plt.title("Time Taken by AudioToSpec")
plt.xlabel("Audio Duration in Seconds")
plt.ylabel("Processing Time in ms")
CPU times: user 34.5 s, sys: 4.54 s, total: 39 s
Wall time: 25.7 s
Text(0, 0.5, 'Processing Time in ms')
{% endraw %}

MFCC Generation

Issue:
MFCC is based on a melspectrogram so it accepts a bunch of the same arguments, but instead of passing them in explicitly, they are passed as a dict to "melkwargs". As a result, in the current state the mfcc has no current info about the hop_length (determines the width) that it was generated with. One option is grabbing the defaults from _GenMelSpec inside AudioToMFCC and pass it into the sg_settings. OTOH this could be an argument for lumping everything into AudioToSpec, including MFCC, and then we'd have the same access to _GenMelSpec arguments for tab-completion. We could also make AudioToMFCC have a 2nd delegation to _GenMelSpec, and then parse the MelSpec arguments ourselves and bundle them into melkwargs before passing them to torchaudio. This would break our concept of wrapping the external functions in internal references like _GenMelSpec, because we'd no longer be agnostic to how theyre implemented. One last note is that melkwargs will not accept extra keywords, only the ones that torchaudio.transforms.MelSpectrogram expects.
{% raw %}
{% endraw %} {% raw %}

class AudioToMFCC[source]

AudioToMFCC(sample_rate=16000, n_mfcc=40, dct_type=2, norm='ortho', log_mels=False, melkwargs=None) :: Transform

Delegates (__call__,decode,setup) to (encodes,decodes,setups) if split_idx matches

{% endraw %} {% raw %}
item0 = AudioTensor.create(ex_files[0])
a2mfcc = AudioToMFCC()
mfcc = a2mfcc(item0)
test_eq(mfcc.n_mfcc, mfcc.data.shape[1])
{% endraw %} {% raw %}
mfcc.show()
{% endraw %} {% raw %}
mfcc._settings
{'sr': 16000,
 'nchannels': 1,
 'sample_rate': 16000,
 'n_mfcc': 40,
 'dct_type': 2,
 'norm': 'ortho',
 'log_mels': False,
 'melkwargs': None}
{% endraw %} {% raw %}
mfcc.height
40
{% endraw %} {% raw %}
mfcc.width
292
{% endraw %} {% raw %}
#n_mfcc specified should determine the height of the mfcc
item1 = AudioTensor.create(ex_files[1])
n_mfcc = 67
a2mfcc67 = AudioToMFCC(n_mfcc=n_mfcc)
mfcc67 = a2mfcc67(item1)
test_eq(mfcc67.shape[1], n_mfcc)
print(mfcc67.shape)
mfcc67.show()
torch.Size([1, 67, 426])
{% endraw %}

Example of passing in melkwargs

{% raw %}
a2mfcc_kwargs = AudioToMFCC(melkwargs={"hop_length":1024, "n_fft":1024})
mfcc_kwargs = a2mfcc_kwargs(item1)
mfcc_kwargs.show()
# make sure a new hop_length changes the resulting width
test_ne(mfcc_kwargs.width, mfcc.width)
{% endraw %}

MFCC Timing Tests

{% raw %}
%%time
a2mfcc = AudioToMFCC()
max_seconds = 180
times_mono = time_variable_length_audios(f=a2mfcc, max_seconds=max_seconds)
times_stereo = time_variable_length_audios(f=a2mfcc, max_seconds=max_seconds, channels=2)
plt.plot(np.arange(0,max_seconds,2), times_mono, label="mono")
plt.plot(np.arange(0,max_seconds,2), times_stereo, label="stereo")
plt.legend(['mono','stereo'])
plt.title("Time Taken by AudioToMFCC")
plt.xlabel("Audio Duration in Seconds")
plt.ylabel("Processing Time in ms")
CPU times: user 42.9 s, sys: 2.69 s, total: 45.6 s
Wall time: 28.9 s
Text(0, 0.5, 'Processing Time in ms')
{% endraw %}

Example Pipelines

DB MelSpectrogram Pipe (Standard)

{% raw %}
mel_cfg = {'n_fft':2560,'hop_length':64}
oa = OpenAudio(files)
a2s = DBMelSpec(**mel_cfg)
db_mel_pipe = Pipeline([oa,a2s])
for i in range(5):
    print("Shape:", db_mel_pipe(i).shape)
    db_mel_pipe.show(db_mel_pipe(i))
Shape: torch.Size([1, 128, 1331])
Shape: torch.Size([1, 128, 891])
Shape: torch.Size([1, 128, 841])
Shape: torch.Size([1, 128, 1491])
Shape: torch.Size([1, 128, 1631])
{% endraw %}

Raw Spectrogram (non-mel, non-db) Pipe

{% raw %}
cfg = {'mel':False, 'to_db':False, 'hop_length':128, 'n_fft':400}
oa = OpenAudio(files)
a2s = AudioToSpec.from_cfg(cfg)
db_mel_pipe = Pipeline([oa, a2s])
for i in range(3):
    print("Shape:", db_mel_pipe(i).shape)
    db_mel_pipe.show(db_mel_pipe(i))
    test_eq(db_mel_pipe(i).hop_length, cfg["hop_length"])
Shape: torch.Size([1, 201, 666])
Shape: torch.Size([1, 201, 446])
Shape: torch.Size([1, 201, 421])
{% endraw %}

DBScale non-melspectrogram Pipe

{% raw %}
oa = OpenAudio(files)
a2s = SpectrogramTransformer(mel=False)()
db_mel_pipe = Pipeline([oa, a2s])
for i in range(3): 
    print("Shape:", db_mel_pipe(i).shape)
    db_mel_pipe.show(db_mel_pipe(i))
Shape: torch.Size([1, 513, 167])
Shape: torch.Size([1, 513, 112])
Shape: torch.Size([1, 513, 106])
{% endraw %}

Pipe using from_cfg (config)

{% raw %}
#non-mel db-scale spectrogram, warning is expected as f_max is an argument to melspectrograms
cfg = {'mel':False, 'to_db':True, 'n_fft':260, 'f_max':22050., 'hop_length':128}
oa = OpenAudio(files)
a2s = AudioToSpec.from_cfg(cfg)
db_mel_pipe = Pipeline([oa, a2s])
for i in range(3): 
    db_mel_pipe.show(db_mel_pipe(i))
/home/condor/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:45: UserWarning: f_max is not a valid arg name and was not used
{% endraw %}

MFCC Pipe

{% raw %}
db_mfcc_pipe = Pipeline([oa, AudioToMFCC(n_mfcc=40),])
for i in range(3): 
    db_mfcc_pipe.show(db_mfcc_pipe(i))
{% endraw %}

AudioConfig Class

{% raw %}
{% endraw %} {% raw %}

config_from_func[source]

config_from_func(func, name, **kwargs)

{% endraw %} {% raw %}
{% endraw %} {% raw %}

class AudioConfig[source]

AudioConfig()

{% endraw %}

<<<<<<< HEAD

{% raw %}
oa(42)
AudioTensor([[0.0000, 0.0000, 0.0000,  ..., 0.0012, 0.0013, 0.0014]])
{% endraw %} {% raw %}
# Basic Mel Spectrogram is just the Torchaudio defaults, which are currently bad, hence
# the empty melbins in the spectrogram below. We can make our own custom good ones like Voice
mel_cfg = AudioConfig.BasicMelSpectrogram()
a2mel = AudioToSpec.from_cfg(mel_cfg)
item0 = AudioTensor.create(ex_files[0])
mel_bad = a2mel(item0)
mel_bad.show()
voice_cfg = AudioConfig.Voice()
a2mel = AudioToSpec.from_cfg(voice_cfg)
mel_good = a2mel(oa(42))
mel_good.show()
{% endraw %}

=======

{% raw %}
# Basic Mel Spectrogram is just the Torchaudio defaults, which are currently bad, hence
# the empty melbins in the spectrogram below. We can make our own custom good ones like Voice
mel_cfg = AudioConfig.BasicMelSpectrogram()
a2mel = AudioToSpec.from_cfg(mel_cfg)
item0 = AudioTensor.create(ex_files[0])
mel_bad = a2mel(item0)
mel_bad.show()
voice_cfg = AudioConfig.Voice()
a2mel = AudioToSpec.from_cfg(voice_cfg)
mel_good = a2mel(oa(42))
mel_good.show()
{% endraw %}

>>>>>>> master

{% raw %}
test_eq(mel_bad.n_fft, mel_cfg.n_fft)
# hop defaults to None in torchaudio but is set later in the code, we override this default to None
# internally in AudioToSpec to ensure the correct hop_length is stored as a sg attribute
test_ne(mel_bad.hop_length, mel_cfg.hop_length)
print("MelConfig Default Hop:", mel_cfg.hop_length)
print("Resulting Hop:",mel_bad.hop_length)
MelConfig Default Hop: None
Resulting Hop: 200
{% endraw %} {% raw %}
sg_cfg = AudioConfig.BasicSpectrogram()
# make sure mel setting is passed down and is false for normal spectro
test_eq(sg_cfg.mel, False)
{% endraw %} {% raw %}
#Grab a random file, test that the n_fft are passed successfully via config and stored in sg settings
oa = OpenAudio(files)
f_num = random.randint(0, len(files))
sg_cfg = AudioConfig.BasicSpectrogram(n_fft=2000, hop_length=155)
a2sg = AudioToSpec.from_cfg(sg_cfg)
sg = a2sg(oa(f_num))
test_eq(sg.n_fft, sg_cfg.n_fft)
test_eq(sg.width, int(oa(f_num).nsamples/sg_cfg.hop_length)+1)
{% endraw %}

Pipeline examples from Config

{% raw %}
oa = OpenAudio(files)
db_mel_pipe = Pipeline([oa, AudioToSpec.from_cfg(sg_cfg)])
for i in range(3): 
    db_mel_pipe.show(db_mel_pipe(i))
{% endraw %} {% raw %}
voice_config = AudioConfig.Voice(); voice_config
Voice(sample_rate=16000, n_fft=1024, win_length=None, hop_length=128, f_min=50.0, f_max=8000.0, pad=0, n_mels=128, window_fn=<built-in method hann_window of type object at 0x7fe8af6c8880>, wkwargs=None, mel='True', to_db='False')
{% endraw %} {% raw %}
oa = OpenAudio(files)
db_mel_pipe = Pipeline([oa, AudioToSpec.from_cfg(voice_config)])
for i in range(3): 
    db_mel_pipe.show(db_mel_pipe(i))
{% endraw %} {% raw %}
mfcc_cfg = AudioConfig.BasicMFCC()
oa = OpenAudio(files)
mfcc_pipe = Pipeline([oa, AudioToMFCC.from_cfg(mfcc_cfg)])
for i in range(44,47):
    print("Shape", mfcc_pipe(i).shape)
    mfcc_pipe(i).show()
Shape torch.Size([1, 40, 314])
Shape torch.Size([1, 40, 199])
Shape torch.Size([1, 40, 266])
{% endraw %}

Export